最近,一些研究在图像压缩感测(CS)任务中应用了深层卷积神经网络(CNN),以提高重建质量。但是,卷积层通常具有一个小的接受场。因此,使用CNN捕获远程像素相关性是具有挑战性的,这限制了其在Image CS任务中的重建性能。考虑到这一限制,我们为图像CS任务(称为uformer-ics)提出了一个U形变压器。我们通过将CS的先验投影知识集成到原始变压器块中,然后使用基于投影基于投影的变压器块和残留卷积块构建对称重建模型来开发一个基于投影的变压器块。与以前的基于CNN的CS方法相比,只能利用本地图像特征,建议的重建模型可以同时利用图像的局部特征和远程依赖性,以及CS理论的先前投影知识。此外,我们设计了一个自适应采样模型,该模型可以基于块稀疏性自适应采样图像块,这可以确保压缩结果保留在固定采样比下原始图像的最大可能信息。提出的UFORFORFOR-ICS是一个端到端框架,同时学习采样和重建过程。实验结果表明,与现有的基于深度学习的CS方法相比,它的重建性能明显优于重建性能。
translated by 谷歌翻译
图像取证是一个上升的话题,因为值得信赖的多媒体内容对于现代社会至关重要。像其他与视觉相关的应用一样,法医分析在很大程度上依赖于适当的图像表示。尽管很重要,但目前对这种表示的理论理解仍然有限,其关键作用的忽视程度不同。对于这个差距,我们试图从理论,实施和应用的角度研究以法医为导向的图像表示作为一个独特的问题。我们的工作始于基本原则的抽象,法医的表示应该满足,尤其是揭示了鲁棒性,可解释性和覆盖范围的关键性。在理论层面上,我们为取证的新表示框架提出了一个称为密​​集不变表示(dir)的新表示框架,该框架的特征是具有数学保证的稳定描述。在实现级别上,讨论了DIR的离散计算问题,并设计相应的准确和快速解决方案具有通用性质和恒定的复杂性。我们在密集域模式检测和匹配实验上演示了上述论点,从而提供了与最新描述符的比较结果。此外,在应用程序级别上,提出的DIR最初是在被动和主动取证中探索的,即复制移动伪造的检测和感知散列,在满足此类法医任务的要求方面表现出了好处。
translated by 谷歌翻译
近年来,提出了基于培训数据中毒的许多后门攻击。然而,在实践中,这些后门攻击容易受到图像压缩的影响。当压缩后门实例时,将销毁特定后门触发器的特征,这可能导致后门攻击性能恶化。在本文中,我们提出了一种基于特征一致性培训的压缩后门攻击。据我们所知,这是第一个对图像压缩强大的后门攻击。首先,将返回码图像及其压缩版本输入深神经网络(DNN)进行培训。然后,通过DNN的内部层提取每个图像的特征。接下来,最小化后门图像和其压缩版本之间的特征差异。结果,DNN将压缩图像的特征视为特征空间中的后门图像的特征。培训后,对抗DNN的后门攻击是对图像压缩的强大。此外,我们考虑了三种不同的图像按压(即,JPEG,JPEG2000,WEBP),使得后门攻击对多个图像压缩算法具有鲁棒性。实验结果表明了拟议的后门攻击的有效性和稳健性。当后门实例被压缩时,常见后攻击攻击的攻击成功率低于10%,而我们压缩后门的攻击成功率大于97%。即使在低压缩质量压缩后,压缩攻击也仍然是坚固的。此外,广泛的实验表明,我们的压缩后卫攻击具有抗拒未在训练过程中使用的图像压缩的泛化能力。
translated by 谷歌翻译
图像表示是计算机视觉和模式识别中的一个重要主题。它在一系列应用中扮演了了解视觉内容的基本作用。据报道,基于矩的图像表示在满足其由于其有益的数学特性而满足语义描述的核心条件,特别是几何不变性和独立性。本文介绍了对图像表示的正交矩的全面调查,涵盖了快速/准确计算,鲁棒性/不变性优化,定义扩展和应用程序的最新进步。我们还为各种广泛使用的正交瞬间创建一个软件包,并在同一基地中评估此类方法。提出的理论分析,软件实施和评估结果可以支持社区,特别是在开发新颖的技术和促进现实世界的应用方面。
translated by 谷歌翻译
During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask.
translated by 谷歌翻译
基于深度学习的超分辨率(SR)近年来由于其高图像质量性能和广泛的应用方案而获得了极大的知名度。但是,先前的方法通常会遭受大量计算和巨大的功耗,这会导致实时推断的困难,尤其是在资源有限的平台(例如移动设备)上。为了减轻这种情况,我们建议使用自适应SR块进行深度搜索和每层宽度搜索,以进行深度搜索和每层宽度搜索。推理速度与SR损失一起直接将其带入具有高图像质量的SR模型,同​​时满足实时推理需求。借用了与编译器优化的速度模型在搜索过程中每次迭代中的移动设备上的速度,以预测具有各种宽度配置的SR块的推理潜伏期,以更快地收敛。通过提出的框架,我们在移动平台的GPU/DSP上实现了实时SR推断,以实现具有竞争性SR性能的720p分辨率(三星Galaxy S21)。
translated by 谷歌翻译
贝叶斯拉索是在线性回归框架中构建的,并应用了吉布斯采样以估计回归参数。本文开发了一种新的稀疏学习模型,称为贝叶斯套索稀疏(BLS)模型,该模型采用了贝叶斯拉索的层次模型公式。与原始贝叶斯套索的主要区别在于估计程序;BLS方法使用基于II类型最大似然过程的学习算法。与贝叶斯拉索相反,BLS提供了回归参数的稀疏估计值。BLS方法还通过引入内核功能来得出非线性监督学习问题。我们将BLS模型与众所周知的相关矢量机,快速拉普拉斯法,再见套索和套索在模拟和真实数据上进行了比较。数值结果表明,BLS稀疏而精确,尤其是在处理嘈杂和不规则数据集时。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译